Discontinuous Statistical Machine Translation with Target-Side Dependency Syntax
نویسندگان
چکیده
For several languages only potentially non-projective dependency parses are readily available. Projectivizing the parses and utilizing them in syntax-based translation systems often yields particularly bad translation results indicating that those translation models cannot properly utilize such information. We demonstrate that our system based on multi bottom-up tree transducers, which can natively handle discontinuities, can avoid the large translation quality deterioration, achieve the best performance of all classical syntax-based translation systems, and close the gap to phrase-based and hierarchical systems that do not utilize syntax.
منابع مشابه
A Dependency Treelet String Correspondence Model for Statistical Machine Translation
This paper describes a novel model using dependency structures on the source side for syntax-based statistical machine translation: Dependency Treelet String Correspondence Model (DTSC). The DTSC model maps source dependency structures to target strings. In this model translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملA Source Dependency Model for Statistical Machine Translation
In the formally syntax-based MT, a hierarchical tree generated by synchronous CFG rules associates the source sentence with the target sentence. In this paper, we propose a source dependency model to estimate the probability of the hierarchical tree generated in decoding. We develop this source dependency model from word-aligned corpus, without using any linguistically motivated parsing. Our ex...
متن کاملA Synchronous Context Free Grammar using Dependency Sequence for Syntax-based Statistical Machine Translation
We introduce a novel translation rule that captures discontinuous, partial constituent, and non-projective phrases from source language. Using the traversal order sequences of the dependency tree, our proposed method 1) extracts the synchronous rules in linear time and 2) combines them efficiently using the CYK chart parsing algorithm. We analytically show the effectiveness of this translation ...
متن کاملStatistical Machine Translation of English – Manipuri using Morpho-syntactic and Semantic Information
English-Manipuri language pair is one of the rarely investigated with restricted bilingual resources. The development of a factored Statistical Machine Translation (SMT) system between English as source and Manipuri, a morphologically rich language as target is reported. The role of the suffixes and dependency relations on the source side and case markers on the target side are identified as im...
متن کامل